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1. Introduction 

<N 

The National Virtual Observatory (NVO) will directly or indirectly touch upon 
all steps in the process of transforming raw observational data into "meaningful" 
' results. These steps include: 
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(1) Acquisition and storage of raw data. 

(2) Data reduction (i.e. translating raw data into source detections). 
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CN ■ (3) Acquisition and storage of detected sources. 

(4) Multi-sensor/multi-temporal data mining of the products of steps (1), (2) 



o 
o 
o 



ctf 



and (3). 



The highly distributed nature of the NVO places new twists on all of these steps. 
.— Future NVO research is likely to focus on developing the software tools necessary 

*pH| for Step (4) as well as the methods for "federating" data from Steps (1) and (3). 

However, past experience with individual surveys indicates that Step (2) has 
dominated computer software and hardware costs and may have a large impact 
on the NVO. 

Federation of data sets from multiple institutions, which is a primary NVO 
goal, will be made significantly easier if improvement of the data reduction 
pipeline software is also undertaken. Addressing the challenges of Step (2) for the 
NVO can be accomplished by significantly improving the software environment 
for data reduction pipelines. Although the NVO can and should influence this 
effort it may be outside of the NVOs core activities. 

The rest of this paper presents a further analysis of the computing and 
networking requirements of the NVO and provides a discussion of some of the 
challenges and solutions for addressing data reduction for these massive NVO 
data sets. 

2. Large Survey Requirements 

Large area imaging surveys are generating data at an exponentially increasing 
rate. Reducing this data is a significant hardware and software challenge that 
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is likely to push the limits of computing for some time to come. The computing 
requirements of these surveys are best understood in terms of the number of 
operations required to process a single pixel or a single detection. By looking 
at the data reduction pipeline in these terms it is easy to determine the overall 
computing needs of a given data stream by multiplying the data rate by the per 
pixel processing requirement. 

2.1. Computing Requirements 

Figure 1 shows the steps that are common to most data reduction pipelines. 
The primary driver is the matched filtering step which involves 2D convolutions 
of the entire image. Such convolutions can easily result in as many as 10 4 
operations per pixel and requires a sizable amount of computing to keep up 
with the data rate. Detection processing also involves 2D convolutions but of a 
sparser nature. Although the real-time processing requirements of a survey may 
be readily satisfied by a few tens of computers, it is quite common to re-run the 
data processing pipeline on several years worth of data after the pipeline software 
has been upgraded. In such cases, hundreds or even thousands of computers will 
be needed to re-process the data in a reasonable time. 

For example, to support a camera with a real time data rate of 100 mil- 
lion pixels per second (e.g. the LSST) requires a computer system that can 
deliver 100 Gigaflops, which is approximately 1000 of todays state of the art 
workstations. To re-process such a data stream could require millions of such 
computers. 

2.2. Software Requirements 

The primary difficulty in developing high performance pipeline software is the 
large number of systems architecture issues (number of processors, total memory, 
network bandwidth, disk bandwidth, ...) that need to be considered in order to 
keep up with the data rate (see Figure 2). In other words, the software pipeline 
becomes highly tuned to the system, which in turn increases the size of the code 
and the expertise necessary to maintain the code. In addition, upgrading to a 
new system can require a significant re-write. All of these issues mean that there 
is very little code re-use across different data pipelines. 

Unfortunately, the need for highly tuned software will only grow in the 
future as next generation computers incorporate more complex features (e.g., 
multi-processor chips, on chip vector units, multi-threading, ...). While it is now 
possible to get 1% or 10% of a computers peak performance with little or no 
optimization, in the future an un-optimized program can expected to give less 
than 0.1% of peak performance. In other words, the price of not being optimized 
to the hardware will increase. 



3. Enabling Technologies 

The primary goals of a computer system architect are to manage complexity and 
to create systems that avoid the need for heroic programming efforts to meet 
the program milestones. In the context of a real-time data reduction pipeline 
this means developing simplified abstractions of the hardware and software so 
as to isolate the specifics of the application from the specifics of the hardware. 
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In the previous section, the argument for complex, highly parallel computing 
systems was presented. Without such systems it will require months to analyze 
a single nights worth of observations, and significantly longer to re-process a 
significant portion of a survey. 

3.1. High Speed Interconnects 

One way to create a less complex parallel computing systems is to invest in 
extremely capable networks. Minimizing network usage is the single most com- 
mon optimization step performed in writing parallel software. High performance 
networks greatly alleviate the pressure on the programmer to implement to the 
specific hardware. In other words, it is incumbent the system architect to assem- 
ble a "balanced" system (i.e. computation vs. communication). Fortunately, 
the rapid rise of the Internet and cluster computing has driven the need for ever 
more capable interconnects. As a result a variety of technologies will be available 
for producing systems that are well balanced for data processing pipelines (see 
Figure 3). 

3.2. High Performance Parallel Software Libraries 

A fast method for implementing high performance data pipelines is to re-use 
already optimized code. The best way is to leverage existing libraries (e.g. 
Lapack, ScaLapack, FFTW, VSIPL, ...) developed by other communities (see 
Figure 4). These software packages remove the majority of the effort required to 
achieve optimal performance on a given computer. In addition, it is important 
for the community to increase the capability of the data pipeline applications 
it has developed (e.g. IRAF, IDL, ...). Currently, these tools provide a variety 
of application specific functions. Unfortunately, they are not designed for real- 
time parallel data pipelines. It would be highly beneficial to upgrade (and add 
to) these tools so that they can exploit the hardware technology that will be 
required to effectively process large surveys. 

4. Summary 

The NVOs core data mining and archive federation activities are heavily de- 
pendent on the underlying data pipeline software necessary to translate the raw 
data into scientifically relevant source detections. The data pipeline software 
dictates: the raw data storage and retrieval mechanisms, the meaning and for- 
mat of the fields in the source catalogs, and the ability of the NVO users to 
re-analyze raw data for their own purposes. Increasing the performance of the 
core data pipeline software so that it can address the needs of current and fu- 
ture high data rate surveys is an important activity that should be addressed in 
concert with the development of the NVO. 



Kepner <fe McMahon 



ops/pixel 



10 2 -100 2 
ops/pixel 




Extended 
Source Analysis 



10 2 -100 2 
ops/detect 



10 2 -100 2 
ops/detect 



Figure 1. Standard steps in a data reduction pipeline. The com- 

puting requirements to fulfill the operation are shown below each step. 
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Figure 2. The complex memory hierarchy of a modern computer 

system. Typically, a data reduction pipeline incorporates the details of 
this memory hierarchy into the program, which limits portability and 
re-use of pipeline software. 
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(1) Race++ ILK16P interconnect. Multiport configurations may have more links. 

(2) Assumes 6U boards, 3-level crossbar tree. 9U would give fewer links per processor, same max. 

(3) Projected performance; assumes fully connected, single-level crossbar switch 

Figure 3. Current and future capabilities of various interconnect 

technologies. 
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Figure 4. Summary of various existing software libraries and their 
capabilities. 



